Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family

نویسندگان

  • Bai Jiang
  • Tung-yu Wu
  • Wing H. Wong
چکیده

In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates [11]. In this paper, we establish consistency and convergence rate of CD with annealed learning rate ηt. Specifically, suppose CD-m generates the sequence of parameters {θt}t≥0 using an i.i.d. data sample X1 ∼ pθ∗ of size n, then δn(X n 1 ) = lim supt→∞ ‖ ∑t s=t0 ηsθs/ ∑t s=t0 ηs − θ∗‖ converges in probability to 0 at a rate of 1/ 3 √ n. The number (m) of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in [11]. which depends critically on the fact that {θt}t≥0 is a homogeneous Markov chain conditional on the observed sample X1 . Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible 2× 2 Boltzmann Machine are provided to demonstrate our theoretical results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning with Blocks: Composite Likelihood and Contrastive Divergence

Composite likelihood methods provide a wide spectrum of computationally efficient techniques for statistical tasks such as parameter estimation and model selection. In this paper, we present a formal connection between the optimization of composite likelihoods and the well-known contrastive divergence algorithm. In particular, we show that composite likelihoods can be stochastically optimized b...

متن کامل

Investigating Convergence of Restricted Boltzmann Machine Learning

Restricted Boltzmann Machines are increasingly popular tools for unsupervised learning. They are very general, can cope with missing data and are used to pretrain deep learning machines. RBMs learn a generative model of the data distribution. As exact gradient ascent on the data likelihood is infeasible, typically Markov Chain Monte Carlo approximations to the gradient such as Contrastive Diver...

متن کامل

Learning Rotation-Aware Features: From Invariant Priors to Equivariant Descriptors Supplemental Material

The R-FoE model of Sec. 3 of the main paper was trained on a database of 5000 natural images (50 × 50 pixels) using persistent contrastive divergence [12] (also known as stochastic maximum likelihood). Learning was done with stochastic gradient descent using mini-batches of 100 images (and model samples) for a total of 10000 (exponentially smoothed) gradient steps with an annealed learning rate...

متن کامل

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

The Convergence of Contrastive Divergences

This paper analyses the Contrastive Divergence algorithm for learning statistical parameters. We relate the algorithm to the stochastic approximation literature. This enables us to specify conditions under which the algorithm is guaranteed to converge to the optimal solution (with probability 1). This includes necessary and sufficient conditions for the solution to be unbiased.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.06220  شماره 

صفحات  -

تاریخ انتشار 2016